fix(ai-red-teaming): repair SDK config regression + restore local analytics by rdheekonda · Pull Request #34 · dreadnode/capabilities

rdheekonda · 2026-06-03T23:47:00Z

Summary

Fixes a regression that broke AI red-teaming attack workflows, restores consumable analytics, and hardens every tool so users never see raw tracebacks.

Part 1 — SDK config regression + local analytics

Root causes

dn.server AttributeError (codegen regression). The generated SDK-config block referenced a non-existent dn.server module attribute in its fallback, raising AttributeError surfaced as a misleading FATAL: Could not configure SDK.
Wrong env-var contract. Config was gated only on DREADNODE_SERVER/DREADNODE_API_KEY; runtimes injecting DREADNODE_LLM_* (or relying on the saved profile) fell into the broken path even though dn.configure() resolves credentials itself.
No local analytics. Scripts printed "completed successfully" but persisted nothing, so results tools reported false failures.

Changes

_build_configure(): defer to dn.configure() (explicit > env > saved profile); read .server off the returned instance.
_resolve_platform_env(): also accept DREADNODE_LLM_BASE/DREADNODE_LLM_API_KEY.
New _build_analytics_writer(): run the SDK's deterministic analyze() over assessment.attack_results and persist a real *_analytics.json (no fabricated metrics) into the workspace dir the tools scan. Wired into all 7 templates.
results.py: envelope-aware parsing (ASR from execution_stats.overall_asr, trials from total_trials); validate_attack_results/get_analytics_summary are platform-aware (no hard failure for platform-only runs).

Part 2 — Never surface raw tool errors to users

New tools/errors.py safe_tool wrapper: catches any unexpected exception in a tool and returns a clean, user-facing message; raw detail goes to stderr only. Preserves name/docstring/signature/annotations so tool schemas are unchanged. Loaded by file path (capability tool files are flat modules with no parent package).
Applied @safe_tool to all 21 tool entrypoints (assessment, attacks, goals, results, session, skills_manager, workflows).
Hardened previously-unguarded helpers to degrade gracefully:
- assessment._load(): missing/corrupt JSON → {}
- goals._load_goals(): missing/unreadable CSV → []

Verification

TAP on groq/meta-llama/llama-4-scout-17b-16e-instruct (attacker = judge = target, 10 iters):

SDK configured: server=… (no crash); standalone re-run exits 0.
[analytics] wrote local analytics: …/<id>_analytics.json; validate_attack_results → ✅; summary shows ASR 100%, Risk 8.0/10, 1 high-severity finding, 1 trial.

Tool hardening:

All 7 tool modules load under the real flat-module loader and expose all 21 tools.
Corrupt assessment file → "No assessment registered"; missing goals CSV → "Goals dataset not found"; forced PermissionError → clean safe_tool message, traceback to stderr only. No tracebacks reach the user.

All modified files py_compile cleanly. No behavior change for environments already setting DREADNODE_SERVER/DREADNODE_API_KEY; tool schemas unchanged.

…lytics Generated attack workflows failed to configure the Dreadnode SDK and produced no consumable results. Root causes: - The codegen SDK-config block referenced a non-existent `dn.server` module attribute in its fallback path, raising AttributeError that surfaced as a misleading 'FATAL: Could not configure SDK'. - It gated configuration on DREADNODE_SERVER/DREADNODE_API_KEY only, skipping its own working branch even when a valid saved profile or DREADNODE_LLM_* runtime env was present. - Generated scripts never wrote local analytics, so inspect_results / validate_attack_results / get_analytics_summary reported false failures. Fixes: - Defer credential resolution to dn.configure() (explicit > env > profile) and read .server off the returned instance, not the module. - _resolve_platform_env(): also recognize DREADNODE_LLM_BASE/_API_KEY. - Generated workflows now run the SDK's deterministic analyze() over assessment.attack_results and persist a real *_analytics.json (no fabricated metrics) into the workspace dir the tools scan. - results.py: parse the new analytics envelope (ASR from execution_stats.overall_asr, trials from total_trials) and make validate_attack_results / get_analytics_summary platform-aware so platform-only runs are not reported as hard failures.

@tool

Add a shared safe_tool wrapper and apply it to all 21 tool entrypoints so any unexpected exception is caught and returned as a clean, user-facing message instead of a raw traceback. Diagnostics go to stderr only. - tools/errors.py: new safe_tool decorator. Wraps sync/async tool fns, preserves name/docstring/signature/annotations (via functools.wraps) so the generated tool schema is unchanged, then applies @tool internally. Loaded by file path because capability tool files are imported as flat modules with no parent package (relative imports are unavailable). - Replace @tool -> @safe_tool across assessment, attacks, goals, results, session, skills_manager, workflows. - Harden previously-unguarded helpers so common recoverable cases degrade gracefully instead of raising: * assessment._load(): tolerate missing/corrupt JSON -> {}. * goals._load_goals(): tolerate missing/unreadable CSV -> []. Verified: all 7 tool modules load under the real flat-module loader and expose all 21 tools; corrupt-file, missing-dataset and forced-exception paths all return clean strings with no traceback.

Patch release covering the SDK-config regression fix, restored local analytics, and the safe_tool error-hardening in this PR.

…from display; add user-POV run sequence Metric clarity: - Present ASR (attack success rate) as the headline success-probability metric (0-100% / 0-1) in get_assessment_status and get_analytics_summary. - Stop surfacing the severity-weighted 0-10 risk score to users. It is computed in the SDK and kept in the raw data / accepted by update_assessment_status for platform parity, but no longer displayed. (True P(success) is ASR; the /10 score is a separate severity measure, so showing both was confusing.) UX: - Greeting now includes a small 5-step user-POV sequence (Plan -> Generate -> Run -> Score -> Report) plus a one-line ASR explanation. - Agent instructed to print a single-line plan before launching a run. Note: this is a presentation-layer change in the capability; the SDK's risk_score computation is unchanged.

rdheekonda added 4 commits June 3, 2026 23:46

chore(ai-red-teaming): bump version 1.3.5 -> 1.3.6

45a88fb

Patch release covering the SDK-config regression fix, restored local analytics, and the safe_tool error-hardening in this PR.

rdheekonda merged commit 1760a77 into main Jun 4, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ai-red-teaming): repair SDK config regression + restore local analytics#34

fix(ai-red-teaming): repair SDK config regression + restore local analytics#34
rdheekonda merged 4 commits into
mainfrom
fix/airt-sdk-config-and-local-analytics

rdheekonda commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rdheekonda commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Part 1 — SDK config regression + local analytics

Root causes

Changes

Part 2 — Never surface raw tool errors to users

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rdheekonda commented Jun 3, 2026 •

edited

Loading